Does Baum-Welch Re-estimation Help Taggers?

نویسنده

  • David Elworthy
چکیده

In part of speech tagging by Hidden Markov Model, a statistical model is used to assign grammatical categories to words in a text. Early work in the field relied on a corpus which had been tagged by a human annotator to train the model. More recently, Cutt ing et al. (1992) suggest that training can be achieved with a minimal lexicon and a limited amount of a priori information about probabilities, by using an Baum-Welch re-estimation to automatically refine the model. In this paper, I report two experiments designed to determine how much manual training information is needed. The first experiment suggests that initial biasing of either lexical or transition probabilities is essential to achieve a good accuracy. The second experiment reveals that there are three distinct patterns of Baum-Welch reestimation. In two of the patterns, the re-estimation ult imately reduces the accuracy of the tagging rather than improving it. The pat tern which is applicable can be predicted from the quality of the initial model and the similarity between the tagged training corpus (if any) and the corpus to be tagged. Heuristics for deciding how to use re-estimation in an effective manner are given. The conclusions are broadly in agreement with those of Merialdo (1994), but give greater detail about the contributions of different parts of the model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Algorithm for finding Noun Phrase Correspondences in Bilingual Corpora

94304 Abstract The paper describes an algorithm that employs English and French text taggers to associate noun phrases in an aligned bilingual corpus. The tag-gets provide part-of-speech categories which are used by finite-state recognizers to extract simple noun phrases for both languages. Noun phrases are then mapped to each other using an iterative re-estimation algorithm that bears similari...

متن کامل

Automatic Error Detection in Part of Speech Tagging

A technique for detecting errors made by Hidden Markov Model taggers is described, based on comparing observable values of the tagging process with a threshold. The resulting approach allows the accuracy of the tagger to be improved by accepting a lower eeciency, deened as the proportion of words which are tagged. Empirical observations are presented which demonstrate the validity of the techni...

متن کامل

Estimating rate constants in hidden Markov models by the EM algorithm

The EM algorithm, e.g., the Baum–Welch re-estimation, is an important tool for parameter estimation in discrete-time hidden Markov models. We present a direct re-estimation of rate constants for applications in which the underlying Markov process is continuous in time. Previous estimation of discrete-time transition probabilities is not necessary.

متن کامل

Comparing the Bidirectional Baum-Welch Algorithm and the Baum-Welch Algorithm on Regular Lattice

A profile hidden Markov model (PHMM) is widely used in assigning protein sequences to protein families. In this model, the hidden states only depend on the previous hidden state and observations are independent given hidden states. In other words, in the PHMM, only the information of the left side of a hidden state is considered. However, it makes sense that considering the information of the b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994